Computational Comparison and Classiication of Dialects

نویسنده

  • Wilbert Heeringa
چکیده

In this paper a range of methods for measuring the phonetic distance between dialectal variants are described. It concerns variants of the frequency method, the frequency per word method and Levensh-tein distance, both simple (based on atomic characters) and complex (based on feature bundles). The measurements between feature bundles used Manhattan distance, Euclidean distance or (a measure using) Pearson's correlation coeecient. Variants of these using feature weighting by entropy reduction were systematically compared, as was the representation of diphthongs (as one symbol or two). The dialects were compared with each other directly and indirectly via a standard dialect. The results of comparison were classiied by clustering and by training of a Kohonen map. The results were compared to well-established scholarship in dialectology, yielding a calibration of the methods. These results indicate that the frequency per word method and the Levenshtein distance outperform the frequency method, that feature representations are more sensitive, that Manhattan distance and Euclidean distance are good measures of phonetic overlap of feature bundles, that weighting is not useful, that two-phone representations of diphthongs mostly outperform one-phone representations, and that dialects should be directly compared to each other. The results of clustering give the sharper classiication, but the Kohonen map is a nice supplement.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

The Short Vowels /i/ and /u/ in Iranian Balochi Dialects

The aim of the present paper is to study the status of the short vowels /i/ and /u/ in five selected Iranian Balochi dialects. These dialects are spoken in Sistan (SI), Saravan (SA), Khash (KH), Iranshahr (IR), and Chabahar (CH) regions located in province Sistan va Baluchestan in the southeast of Iran. This study investigates whether these two vowels have the same qualities as the short /i/ an...

متن کامل

On Comparing Classifiers

An importantcomponentof many data mining projects is nding a good classiication algorithm, a process that requires very careful thought about experimental design. If not done very carefully, comparative studies of classiication and other types of algorithms can easily result in statisticallyinvalid conclusions. This is especiallytrue when one is using data mining techniquesto analyze very large...

متن کامل

Computational analysis of Gondi dialects

This paper presents a computational analysis of Gondi dialects spoken in central India. We present a digitized data set of the dialect area, and analyze the data using different techniques from dialectometry, deep learning, and computational biology. We show that the methods largely agree with each other and with the earlier non-computational analyses of the language group.

متن کامل

Learning from Relatives: Unified Dialectal Arabic Segmentation

Arabic dialects do not just share a common koiné, but there are shared pandialectal linguistic phenomena that allow computational models for dialects to learn from each other. In this paper we build a unified segmentation model where the training data for different dialects are combined and a single model is trained. The model yields higher accuracies than dialect-specific models, eliminating t...

متن کامل

Computational Comparison and Classification of Dialects

In this paper a range of methods for measuring the phonetic distance between dialectal variants are described. It concerns variants of the frequency method, the frequency per word method and Levenshtein distance, both simple (based on atomic characters) and complex (based on feature bundles). The measurements between feature bundles used Manhattan distance, Euclidean distance or (a measure usin...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2000